-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add PdfWriter.remove_objects_from_page(page: PageObject, to_delete: ObjectDeletionFlag) #1648
Conversation
Codecov ReportBase: 92.25% // Head: 92.40% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1648 +/- ##
==========================================
+ Coverage 92.25% 92.40% +0.15%
==========================================
Files 33 33
Lines 6431 6468 +37
Branches 1286 1287 +1
==========================================
+ Hits 5933 5977 +44
Misses 315 315
+ Partials 183 176 -7
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I've completed the test file with some Form data |
@MartinThoma |
pypdf/_writer.py
Outdated
@@ -1808,85 +1808,101 @@ def removeLinks(self) -> None: # deprecated | |||
deprecation_with_replacement("removeLinks", "remove_links", "3.0.0") | |||
return self.remove_links() | |||
|
|||
def remove_img_or_text( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather make this a private method:
def remove_img_or_text( | |
def _remove_img_or_text( |
(that needs some adjustments in other parts as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave it public : currently remove_images()
and remove_text()
are cleaning the full document : this will provide a solution to just clean one page.
However I agree the name is too tricky : I propose to rename the function into remove_image_or_text_from_page()
Your opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand! So it's actually adding a new feature :-)
I don't like the image_or_text
part of the method name. Maybe we can change the signature to this:
class ObjectDeletionFlag(enum.IntFlag):
TEXT = enum.auto()
IMAGES = enum.auto()
...
def remove_objects_from_page(
self, page: Union[PageObject, DictionaryObject], to_delete: ObjectDeletionFlag
) -> None:
I see some advantages in the proposed interface:
- We could add the capability to delete annotations (maybe even specific annotation types) in the same method
- We could allow users to specify multiple flags at one method call
- Personal preference: When I see an
_or_
in a method name, I tend to think that those should be two methods 😄 - Personal preference: I think that the current use of
del_image=False
=> delete text is not intuitive. I would like to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def remove_objects_from_page( self, page: Union[PageObject, DictionaryObject], to_delete: ObjectDeletionFlag ) -> None:
Sounds good to me.
I see some advantages in the proposed interface:
1. We could add the capability to delete annotations (maybe even specific annotation types) in the same method
Text and Images requires to remove to parse content, that's why they are specific, however from a interface point of view we can merge them.
2. We could allow users to specify multiple flags at one method
good idea implemented.
3. Personal preference: When I see an `_or_` in a method name, I tend to think that those should be two methods 😄
not iaw my rules...😉🤣🤣🤣🤣
4. Personal preference: I think that the current use of `del_image=False` => delete text is not intuitive. I would like to avoid that.
btw I hate the name of the function remove_links()
which name is not describing it properly. Shouldn't we have remove_annots()
and remove_links()
- for links only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look it up - https://pypdf.readthedocs.io/en/latest/modules/PdfWriter.html#pypdf.PdfWriter.remove_links
Yes, that is exactly this kind of unexpected dual-purpose which I don't like -.-
I want to keep the amount of breaking changes low for this year. Give people a chance to adapt. But we should make a list of such things to be cleaned-up in future.
Additionally, the proposed remove_objects_from_page could support links and annotations as well (just by adding it to the ObjectDeletionFlag
/ letting users provide that value.
This reverts commit 0928c8d.
965d9b7
to
3ed6653
Compare
for extra tests |
def remove_objects_from_page( | ||
self, | ||
page: Union[PageObject, DictionaryObject], | ||
to_delete: Union[ObjectDeletionFlag, Iterable[ObjectDeletionFlag]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make our lives here easier and not allow the Iterables (tuple / list). It is a flag and thus people should use it.
If they don't know it, they will just do multiple calls (which is fine as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see how to change keeping the code just affecting annotations/comments...
@pubpub-zz I would like to get this merged in the release today. Would it be OK for you if I take care of the remaining (rather stylistic) changes within your PR? |
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
last point remaining is about LIST/TUPLES |
@MartinThoma All yours 😊 |
Very nice PR! I've just merged it and will release it today :-) Good work @pubpub-zz 👏 |
New Features (ENH) - Add reader.attachments public interface (#1611, #1661) - Add PdfWriter.remove_objects_from_page(page: PageObject, to_delete: ObjectDeletionFlag) (#1648) - Allow free-text annotation to have transparent border/background (#1664) Bug Fixes (BUG) - Allow decryption with empty password for AlgV5 (#1663) - Let PdfWriter.pages return PageObject after calling `clone_document_from_reader()` (#1613) - Invalid font pointed during merge_resources (#1641) Robustness (ROB) - Cope with invalid objects in IndirectObject.clone (#1637) - Improve tolerance to invalid Names/Dests (#1658) - Decode encoded values in get_fields (#1636) - Let PdfWriter.merge cope with missing "/Fields" (#1628) [Full Changelog](3.4.1...3.5.0)
fix remove_text to set contents as indirect_objects (iaw PDF Specification)
wipe out text also in XObject forms
Same issues + refactoring for remove_images()
closes #1644
closes #1650